whisper : add support for large v3 #1444

ggerganov · 2023-11-07T09:59:18Z

NOTE: re-download ggml-large.bin to get the v3 version

ggml-large.bin is the new v3 model
ggml-large-v2.bin is the old v2 model

./models/download-ggml-model.sh large

This should be ready to merge.

I did some anecdotal tests using the audio samples in this repo and seems like v3 tends to repeat some lines more than v2. Could be a problem on whisper.cpp side, though I ran one of the audio samples with the OG whisper and it repeats in a similar way:

$ ▶ whisper samples/hp0.wav --model large
/opt/homebrew/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:11.840]  Henry F. Phillips, from Wikipedia, the free encyclopedia, at en.wikipedia.org
[00:11.840 --> 00:26.140]  Henry F. Phillips, 1890-1958
[00:26.140 --> 00:38.160]  A U.S. businessman from Portland, Oregon, has the honor of having the Phillips head screw and screwdriver named after him.
[00:39.280 --> 00:52.120]  The importance of the cross-head screw design lies in its self-centering property, useful on automated production lines that use powered screwdrivers.
[00:53.760 --> 00:56.120]  Phillips' major contribution was in...
[00:56.140 --> 01:04.640]  driving the cross-head concept forward, to the point where it was adopted by screwmakers and automobile companies.
[01:05.580 --> 01:10.380]  Although he received patents for the design in 1936,
[01:10.380 --> 01:17.720]  U.S. Patent No. 2,046,343
[01:17.720 --> 01:24.100]  U.S. Patents 2,046,837
[01:24.100 --> 01:25.380]  to 2,046,837
[01:26.140 --> 01:30.100]  to 2,046,842
[01:30.100 --> 01:32.340]  to 2,046,840
[01:32.340 --> 01:38.160]  to 2,046,840
[01:38.160 --> 01:47.880]  The American Screw Company was responsible for devising a means of manufacturing the Life FAQ function of Phillips hard come to proving praise by itsọi
[01:47.880 --> 01:50.960]  and licensed their method.
[01:50.960 --> 01:53.580]  Other screw makers of the 1930s dismissed the Phillips concept since...
[01:53.580 --> 01:55.500]  Other screw makers of the 1930s dismissed the Phillips concept since...
[01:55.500 --> 01:56.000]  Other screw makers of the 1930s dismissed the Phillips concept since...
[01:56.000 --> 02:02.240]  since it calls for a relatively complex, recessed socket shape in the head of the screw,
[02:03.080 --> 02:08.160]  as distinct from the simple milled slot of a slotted-type screw.
[02:08.740 --> 02:17.740]  The Phillips Screw Company and the American Screw Company went on to devise the posidrive screw,
[02:18.420 --> 02:25.260]  which differs from the Phillips in that it is designed to accommodate greater torque than the Phillips.
[02:26.000 --> 02:32.920]  An image accompanied this article, captioned, Phillips Screw Head.
[02:34.400 --> 02:39.440]  The following is an infobox which accompanies this article.
[02:40.660 --> 02:45.660]  Infobox, part of the series on screw drive types.
[02:47.160 --> 02:51.560]  Slotted, commonly, erroneously, flathead.
[02:52.820 --> 02:55.220]  Phillips, crosshead.

Anyway, we can't make any conclusions based on this single case, so will merge this for now and see what people report.

Edit: ran one more example with the OG whisper and this one even produces wrong characters (starts at 01:27.220):

$ ▶ whisper tests/es-0-16khz.wav --model large
/opt/homebrew/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: Spanish
[00:00.000 --> 00:06.720]  Hola, ¿cómo están todos? Mi nombre es Julián Birrueta Mendoza y en este podcast les vengo
[00:06.720 --> 00:11.780]  a hablar sobre la contaminación del agua. Bueno, empezaré por decir que el ser humano
[00:11.780 --> 00:16.840]  no está midiendo las consecuencias de sus actos. No hay duda que uno de los mayores
[00:16.840 --> 00:21.060]  problemas a los que se enfrentan muchas poblaciones actualmente es la contaminación del agua.
[00:22.740 --> 00:27.220]  Principalmente porque, como bien sabemos, el agua prácticamente es fundamental para
[00:27.220 --> 00:31.340]  la vida, por lo que la contaminación puede ser algo muy negativo para el desarrollo
[00:31.340 --> 00:36.900]  tanto económico como social de los pueblos o de las poblaciones próximas en ese lugar
[00:36.900 --> 00:41.500]  contaminado. Los comienzos de la contaminación, como lo
[00:41.500 --> 00:46.100]  definen muchos expertos en la materia, la contaminación del agua es causada por las
[00:46.100 --> 00:50.760]  actividades humanas. Es un fenómeno ambiental de importancia, el cual se comienza a producir
[00:50.760 --> 00:56.040]  desde los primeros intentos de industrialización para transformarse luego en un problema tan
[00:56.040 --> 00:57.040]  habitual como generalización.
[00:57.220 --> 01:03.340]  Generalmente, la contaminación del agua se produce a través de la introducción directa
[01:03.340 --> 01:11.340]  o indirecta en los acuíferos o cauces de agua, ríos, mares, lagos, océanos, etcétera,
[01:11.340 --> 01:15.180]  o de diversas sustancias que pueden ser consideradas como contaminantes.
[01:15.180 --> 01:22.580]  Pero existen dos formas principales de contaminación del agua. Una de ellas tiene que ver con la
[01:22.580 --> 01:27.200]  contaminación natural del agua, que se corresponde con el ciclo natural de esta contaminación.
[01:27.220 --> 01:29.440]  El régimen de contaminación es basicamente並bada sobre la contaminación y su contenido
[01:29.440 --> 01:33.020]  es declarado como contaminante como un tipo de fuente asiática que dañaría la血as
[01:33.020 --> 01:57.020]  de включar o envianges y reducir la bud
[01:57.220 --> 02:04.200]  Bueno amigos, yo los invito a que no contaminen el agua y que sepan cuidar la naturaleza.
[02:05.100 --> 02:08.840]  Los saluda su buen amigo y compañero Julián Virreta.
[02:10.040 --> 02:10.460]  Nos vemos.

Not sure if I'm doing something wrong - would be helpful if people can confirm this.

ggerganov · 2023-11-07T11:32:11Z

I cannot push to HuggingFace - any idea what is wrong?

git push
batch response: Authorization error. B | 0 B/s                                                                                                                                                                                                                                             
Uploading LFS objects:   0% (0/2), 0 B | 0 B/s, done.
error: failed to push some refs to 'https://huggingface.co/ggerganov/whisper.cpp'

I have created a write access token and have used huggingface-cli login, but it keeps rejecting

Edit: this fixed the issue https://discuss.huggingface.co/t/cant-push-to-new-space/35319/24

Pushing the new model to https://huggingface.co/ggerganov/whisper.cpp

neurostar · 2023-11-07T13:40:13Z

Thanks for quickly supporting v3 model!
Tested in m2 macbook air, works great with metal (fp16, q5)
coreml conversion do not work yet. huggingface transformers need to updated (they just commit patch).

LeiHao0 · 2023-11-07T16:21:16Z

I found the same that v3 repeats more than v2, even with VAD audios.

@arabcoders also mentioned it duplicates more in Japanese under the official whisper repo in here.
openai/whisper#1762
It seems something wrong inside the model, and hope it can be fixed soon. Currently I rollback to v2.

Again, thank you for the quick support on v3 model and keep the v2 as well. This is the only one whisper project that works like a charm on my Mac Studio with Metal/MPS and even CoreML enabled!

* whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme

monk1337 · 2023-11-07T21:12:13Z

Awesome, thanks for quick support ;)

emcodem · 2023-11-08T13:18:41Z

Also in my first tests i found that, V3 large repeats or hallucinates a LOT more than V2. Not sure if it was a good idea that the V3 model is now the default large model - at least not without the obviously needed changes that mitigate the new repetitions

* whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme

whisper : add support for large v3

185d3fd

bobqianic mentioned this pull request Nov 7, 2023

Support for large-v3 #1437

Closed

bench : fix build + fix go bindings

8fb0a1c

ggerganov added 2 commits November 7, 2023 13:45

bench : fix n_mels

a0c0d08

models : update readme

40be742

ggerganov merged commit 2cdfc4e into master Nov 7, 2023
72 of 73 checks passed

vonstring pushed a commit to vonstring/whisper.cpp that referenced this pull request Nov 7, 2023

whisper : add support for large v3 (ggerganov#1444)

ec05db6

* whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme

Ayanaminn mentioned this pull request Nov 8, 2023

话说这个就是最新的Large V3吗？ Ayanaminn/N46Whisper#70

Closed

despairTK mentioned this pull request Nov 9, 2023

Some problems with large-v3 Purfview/whisper-standalone-win#100

Closed

felrock pushed a commit to felrock/whisper.cpp that referenced this pull request Nov 18, 2023

whisper : add support for large v3 (ggerganov#1444)

bbf74e5

* whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023

whisper : add support for large v3 (ggerganov#1444)

6d6c123

* whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme

ggerganov mentioned this pull request May 30, 2024

Latest 1.6.2 release substantial increase in hallucinations for large-v3 on CUDA #2191

Open

iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024

whisper : add support for large v3 (ggerganov#1444)

423e154

* whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : add support for large v3 #1444

whisper : add support for large v3 #1444

ggerganov commented Nov 7, 2023 •

edited

Loading

ggerganov commented Nov 7, 2023 •

edited

Loading

neurostar commented Nov 7, 2023

LeiHao0 commented Nov 7, 2023

monk1337 commented Nov 7, 2023

emcodem commented Nov 8, 2023

whisper : add support for large v3 #1444

whisper : add support for large v3 #1444

Conversation

ggerganov commented Nov 7, 2023 • edited Loading

ggerganov commented Nov 7, 2023 • edited Loading

neurostar commented Nov 7, 2023

LeiHao0 commented Nov 7, 2023

monk1337 commented Nov 7, 2023

emcodem commented Nov 8, 2023

ggerganov commented Nov 7, 2023 •

edited

Loading

ggerganov commented Nov 7, 2023 •

edited

Loading